Polygenic risk score-based prediction of breast cancer risk in Taiwanese women with dense breast using a retrospective cohort study

Mammographic screening has contributed to a significant reduction in breast cancer mortality. Several studies have highlighted the correlation between breast density, as detected through mammography, and a higher likelihood of developing breast cancer. A polygenic risk score (PRS) is a numerical score that is calculated based on an individual's genetic information. This study aims to explore the potential roles of PRS as candidate markers for breast cancer development and investigate the genetic profiles associated with clinical characteristics in Asian females with dense breasts. This is a retrospective cohort study integrated breast cancer screening, population genotyping, and cancer registry database. The PRSs of the study cohort were estimated using genotyping data of 77 single nucleotide polymorphisms based on the PGS000001 Catalog. A subgroup analysis was conducted for females without breast symptoms. Breast cancer patients constituted a higher proportion of individuals in PRS Q4 (37.8% vs. 24.8% in controls). Among dense breast patients with no symptoms, the high PRS group (Q4) consistently showed a significantly elevated breast cancer risk compared to the low PRS group (Q1–Q3) in both univariate (OR = 2.25, 95% CI 1.43–3.50, P < 0.001) and multivariate analyses (OR: 2.23; 95% CI 1.41–3.48, P < 0.001). The study was extended to predict breast cancer risk using common low-penetrance risk variants in a PRS model, which could be integrated into personalized screening strategies for Taiwanese females with dense breasts without prominent symptoms.


SNP
Single nucleotide polymorphisms TPMI Taiwan Precision Medicine Initiative Breast cancer is a major health concern in many countries, including Taiwan.According to the World Health Organization, 2.3 million women worldwide received a diagnosis of breast cancer in 2020 1 .The 2020 Taiwan Cancer Registry Annual Report indicated breast cancer as the most prevalent type of cancer among Taiwanese women, representing 28.5% of all newly diagnosed cases of cancer in this population (total number of new cases: 15,205).The incidence of breast cancer in Taiwan has exhibited a consistent upward trend over the past 3 decades (from 32.8 per 100,000 women in 1992 to 74.3 per 100,000 women in 2020), indicating its status as an increasing public health concern 2 .This trend underscores the importance of effective screening efforts.In July 2004, the Taiwanese government implemented a nationwide screening program involving biennial mammography for women aged 40-69.Such screening results in significant reductions in breast cancer-related mortality 3 .Asian females are more likely to have dense breast tissues than other women; mammography-detected breast density has been reported to be correlated with breast cancer risk 4 .Among Asian women, the risk of breast cancer increases with increasing breast density 5 .
From a clinical standpoint, multigene cancer predisposition panels and germline genetic testing serve as valuable tools that enables clinician to provide women with counseling regarding their individual risk of breast cancer.However, how to effectively translate genetic information into evidence-based clinical decisions remains unclear.Furthermore, the added benefits of breast cancer prevention and surveillance strategies personalized in accordance with carrier status for moderate penetrance genes are less clear than those of the strategies for high-penetrance genes, such as BRCA1 and BRCA2 6 .A polygenic risk score (PRS) is a numerical score that is calculated on the basis of an individual's genetic data, specifically information on DNA sequence variants, to estimate the risk of a particular disease or condition.PRSs are typically derived from large-scale genetic studies, such as genome-wide association studies (GWAS), which identify genetic variants that are associated with a particular disease or trait 7 .To calculate a PRS, genetic variants that have been associated with the disease or trait of interest are combined and weighted according to their effect sizes, which represent the strength of the association between the genetic variant and the disease or trait.Then, the weighted genetic variants are summed to generate an overall score that represents an individual's genetic risk for the disease or trait 8,9 .PRS can be used for various purposes, such as the prediction of an individual's risk of developing a disease, risk-based stratification of individuals, and identification of high-risk individuals who may benefit from early screening or targeted prevention strategies 10 .
PRS may reveal low-penetrance risk variants profiles in individuals for specific disease.However, the use of a PRS for estimating breast cancer risk is typically restricted to gene-only models, and PRSs are rarely integrated with breast cancer screening (BCS) databases.BCS databases encompass a wide range of risk factors closely linked to breast cancer development.In this study, we evaluated the ability of a PRS to predict breast cancer risk in women with dense breasts.In addition, we investigated whether certain single nucleotide polymorphisms (SNPs) can serve as the indicators of breast cancer risk.Furthermore, we calculated the PRSs of our study cohort to identify genetic factors associated with clinical characteristics in females at-risk and the development risk of breast cancer.

Results
The clinical characteristics of the study cohort are summarized in Table 1.The mean age of the study cohort was 57.4 ± 7.6 (range: 40-71) years.Most of the included women were well-educated, with > 70% having received at least high school education.The cohort included 6315 women.Of them, 8.9% had a family history of breast cancer: 3.3% had a first-degree relative with breast cancer, and 5.6% had a second-degree relative with the disease.Approximately, 87% of the women had a history of pregnancy, with the average age at first birth being 26.6 ± 4.5 years.Breast cancer (27.6 ± 4.7 years) had a later age at first birth compared to non-breast cancer controls (26.5 ± 4.5 years, P = 0.023).The pregnancy parity in the study cohort was mostly one to two, and approximately 46.1% had experience with breastfeeding.Of the study cohort, 75.7% reported experiencing menopause, while breast cancer (65.8%) exhibited a significantly lower proportion in menopausal compared to controls (75.9%,P = 0.014).However, the age at menopause between both subgroups showed no significant difference, mostly around 49.6 ± 5.0 years.Approximately 10.3% and 13.0% of the study cohort reported previous use of oral contraceptive and hormone replacement therapy, respectively.In addition, 7.4% study cohort had a history of breast surgery.Furthermore, approximately 33% women had benign breast disease, including non-proliferative or proliferative without atypia.Only 18.6% women reported regularly conducting self-examinations.Approximately 6.6% of the included women had mastalgia or palpable breast lesions and 7.3% had a history of any other type of cancer.Regarding diagnostic evaluations, 28.5% of the women had undergone mammogram once and 12.6%, twice.The prevalence of breast cancer family history, breast surgery history, and breast symptoms was higher among women with breast cancer than among control (non-breast cancer) individuals.Notably, the prevalence of mastalgia or palpable breast lesions was significantly higher among women with breast cancer than among the control individuals (P < 0.001).
The distribution of PRSs in the study cohort is depicted in Fig. 1a.Patients with breast cancer had higher PRS than did control individuals.We further divided the cohort into four quartiles (Q1-Q4; Table 2) and found that the proportion of women with breast cancer in PRS Q4 was higher than that of control individuals (37.8% vs. 24.8%,respectively).The results of the analysis indicated that the risk of breast cancer in PRS Q4 was significantly higher than that in PRS Q1 (OR: 1.93; 95% CI 1.16-3.31;P = 0.013); however, no significant difference in breast cancer risk was noted between PRS Q1 and PRS Q2 or Q3.We further divided the patients into a high-PRS (PRS Q4) and a low-PRS (PRS Q1-Q3) subgroups; the risk of breast cancer was significantly in the high-PRS subgroup than in the low-PRS subgroup (OR: 1.85; 95% CI 1.25-2.71;P = 0.002).Additionally, we conducted multivariate analyses for both PRS subgroups, adjusting for breast cancer-associated risk factors, including previous pregnancy, menopause status, oral contraceptive use, and hormone replacement therapy.These results remained consistent with the univariate analysis, demonstrating that after adjusting for these relevant risk factors, the PRS continued to exhibit robust contributions to breast cancer risk estimation.
The predictive performance of the PRS and breast cancer-related clinical characteristics were assessed (Table 3).The risk of breast cancer was significantly higher in the high-PRS subgroup than in the low-PRS subgroup; this result was obtained using both univariate and multivariate models.Notably, the model including breast cancer risk-associated clinical characteristics outperformed the PRS-only model; the OR increased from 0.565 (95% CI 0.520-0.611)to 0.727 (95% CI 0.677-0.777)after incorporation of clinical characteristics and relevant factors in our statistical model.The risk of breast cancer was significantly low in patients with benign breast disease (OR: 0.51; 95% CI 0.31-0.81;P = 0.006) and menopause (OR: 0.48; 95% CI 0.26-0.88;P = 0.018), but significantly high in those with mastalgia or palpable breast lesion (OR: 7.03; 95% CI 4.38-11.1;P < 0.001).
Because women with breast cancers differed significantly from control individuals in term of clinical presentations, we further analyzed the distribution of PRSs in individuals stratified by breast symptoms.Overall, in women with no apparent breast symptoms, the distribution of exhibited a right-sided and narrow peak; their PRSs were higher than those of control individuals (Fig. 1b).However, when analyzing women with mastalgia or palpable lesion (421 patients), we did not observe the same right-sided and narrow pattern of PRS distribution (Fig. 1c).The results presented in Fig. 1b, c suggest that the association between PRS and breast cancer risk varies depending on the presence of specific breast clinical characteristics.Individuals without breast symptoms may be overlooked, or their risk of breast cancer may be underestimated; consequently, they may be less likely to be screened for breast cancer.Therefore, we conducted a subgroup analysis of women without breast symptoms to assess the performance of PRS model in predicting the risk of breast cancer in women with dense breasts but no symptoms.For this population, the predictive performance of the PRS-only model was 0.589 (95% CI 0.534-0.644)and that of the multivariate model (PRS plus clinical characteristics and relevant factors) was 0.682 (95% CI 0.623-0.741).Among 5914 women with dense breast but no symptoms, those with high PRSs (Q4) consistently had significantly higher risks of breast cancer than did women with low PRS groups (Q1-Q3); this finding was obtained using both univariate (OR: 2.25; 95% CI 1.43-3.50;P < 0.001) and multivariate analyses (OR: 2.23; 95% CI 1.41-3.48;P < 0.001).Furthermore, patients with menopausal status (OR: 0.37; 95% CI 0.18-0.76;P = 0.007) consistently showed a decreased breast cancer risk.www.nature.com/scientificreports/

Discussion
Our study findings revealed that the association between PRS and breast cancer risk may vary depending on the presence of specific breast symptoms and clinical characteristics.Because women without breast symptoms and clinical characteristics may be neglected in screening, a PRS may improve risk prediction when screening these individuals.www.nature.com/scientificreports/ The incorporation of cancer-related genetic variants in a PRS model may help evaluate individuals' genetic susceptibility to cancer 11,12 .In a study investigating the association between individuals' PRSs and breast cancer risk, cancer susceptibility was determined on the basis of SNPs in women carrying pathogenic mutations in BRCA1 and BRCA2 13 .The aforementioned study indicated that the incorporation of PRSs into risk prediction models can improve the calculation of personalized risk estimates for individuals carrying mutations in BRCA1 and BRCA2; in addition, it can guide clinical decisions regarding the management of cancer risk 13 .Incorporating PRS calculation into breast cancer risk estimation may help health-care providers identify high-risk individuals and implement effective prevention and early detection strategies, such as early and frequent screenings and preventative interventions 11 .
A key point addressed in the present study is the potential effects of integrating PRS evaluation into screening protocols for estimating the risk of breast cancer.In a study that compared the use of family history data and PRSs for biennial screening in individuals aged 50-74 years, the combined method resulted in the highest gain in life-years (29%) and averted breast cancer related deaths 14 .These findings suggest that combining the evaluation of PRSs with the assessment of other risk factors, such as family history of breast cancer, is beneficial.We found that PRS evaluation has the potential to improve the prediction of breast cancer risk in women with dense breasts but no prominent symptoms; therefore, combining the evaluation of PRSs with the assessment of breast disease history and breast symptoms may also contribute to early detection of breast cancer.
More Asian women than Western women have dense breasts 15 .In this study, < 1% of all included women were diagnosed as having fatty breasts in our initial cohort.Dense breast tissue is a normal and common finding on mammograms and refers to the presence of breasts with higher proportions of glandular and connective tissues than that of fatty tissue 16 .The appearance of dense breast tissue is determined through mammography because no physical signs or symptoms are associated with such tissue 17 .No direct relationship has been reported between dense breasts and mastalgia or palpable lesions 18 .Although our findings indicate that women with dense breasts but no symptoms have PRSs hat differentiate them from those with mastalgia or palpable lesions, insufficient evidence precludes a determination of whether genetic differences exist between women with mastalgia or palpable lesions and those without these conditions.Nonetheless, women with dense breasts are at higher risk for breast cancer; supplementary screening tests may be necessary for early cancer detection and treatment in this population 17,19 .Therefore, personalized screening strategies may facilitate the prevention and early detection of breast cancer, particularly in high-risk women with dense breasts 19 .
The use of multiple-ancestry PRSs leverages genetic ancestral composition to extend the applicability of polygenic risk prediction beyond European populations, offering women of diverse and mixed ancestries with an opportunity to receive additional personalized treatment 20 .Hence, incorporating PRS evaluation into clinical screening involves the consideration of potential ethical and social issues pertaining to genetic testing and patient privacy.In addition, PRS evaluation may not be equally effective for all women, particularly those from non-European populations, among whom the genetic architecture of breast cancer may differ.
A limitation of this study was that it was conducted using data from a single institute, which limits the generalizability of our findings.Further research is needed to evaluate the optimal use of PRSs in clinical practice-the potential benefits and risks associated with incorporating PRS into breast cancer screening programs.In the future, large-scale, multicenter studies should be conducted incorporating PRS evaluation into breast cancer screening and prevention strategies; such research may reveal the predictive value of PRS in different populations.In addition, studies should be conducted to identify the best approaches for communicating PRS results to patients and healthcare providers.Despite the aforementioned limitations, our study highlights the potential benefits of incorporating PRS evaluation into programs for breast cancer risk prediction and management, particularly in populations that may be underrepresented or overlooked in traditional risk assessment methods.Further research is needed to explore the optimal use of PRSs in clinical practice to fully understand its role in breast cancer prevention and treatment.This study emphasized the potential benefits of using PRSs in the prediction of breast cancer risk in women with dense breasts, a group that may not manifest prominent symptoms.Integrating the evaluation of PRS with the assessment of menopause and breast symptoms may significantly contribute to the early detection of breast cancer.

Conclusion
Breast cancer screening has undergone a shift from a general approach to a personalized, risk-based approach.Identifying the key factors that contribute to breast cancer susceptibility can help us develop screening protocols based on age, breast density, and other factors.Unlike the contemporary breast cancer screening guidelines that recommend the use of family history as the only risk factor, our proposed model identifies both genetic and clinical risk factors using big data analysis, with inputs from electronic health records, cancer screening data, and the cancer registry database of a single center.We extended breast cancer risk prediction by using common lowpenetrance risk variants and constructing a PRS model, which could be integrated into personalized screening strategies for Taiwanese women with dense breasts without prominent symptoms.However, the clinical utility of PRSs in guiding breast cancer screening and prevention remains to be comprehensively established; therefore, further research is needed to determine the optimal use of PRSs in clinical practice.

Study cohort
This retrospective cohort study was approved by the ethics committee of our institution (registered number CE23245B).All methods were performed in accordance with the relevant guidelines and regulations.The inclusion criteria were being women, having undergone mammography screening, and not meeting any of the following exclusion criteria: having fatty breasts, being screened in an out-of-hospital setting, and lacking Taiwan www.nature.com/scientificreports/Precision Medicine Initiative (TPMI) genotyping data.The study flowchart is presented in Fig. 2. From the local BCS database, we retrieved the data of 10,403 females who undergone mammography between February 2017 and November 2022.Subsequently, we linked the BCS cohort to the electronic health records of Taichung Veterans General Hospital (TC-VGH) to obtain TPMI genotyping (registered number: SF19153A) and cancer registry data.In total, 10,403 local women who had undergone mammography were linked to the TC-VGH electronic health records database, which contained the data of 40,166 women.Of the 10,403 women, 6877 women were successfully mapped to the BCS cohort.We excluded 51 women with fatty breast, 418 women screened in an out-of-hospital setting (resulting in a lack of study information), and 73 women without TPMI genotyping data.Finally, a total of 6335 women were included in this study.Of them, 111 women received a diagnosis of breast cancer during the study period; the remaining women who were at risk of breast cancer were included as control individuals.

Polygenic risk score (PRS)
The PRSs of the study cohort were estimated using TPMI genotyping data.To evaluate the PRSs, 77 SNPs were selected as candidates, all previously identified as breast cancer susceptibility loci for various types of breast cancer, and all SNPs have reached genome-wide significance (P < 5 × 10 −8 ) 21 .The polygenic risk effects of these 77 candidate SNPs on breast cancer have been previously validated through a large European cohort study, leading to the establishment of a reported trait for breast cancer named PGS000001 (PGS Name: PRS77_BC) The candidate SNPs used in this study are listed in Supplementary Table S1.The PRS for 77 SNPs was derive using the Eq. ( 1).
where β n is the per-allele log odds ratio (OR) estimated using logistic regression for breast cancer risk associated with SNP n , and x n is the allele dosage for SNP n .The distribution of PRS in the study cohort is illustrated using a ridge plot.The PRSs of the included individuals were divided into four quartiles, and the risk of breast cancer in each PRS quartile was estimated through binomial logistic regression.

Statistical analysis
The clinical characteristics of the study cohort are presented using mean ± standard deviation or frequency and percentage values.The distribution of characteristics between women with breast cancer and control individuals were assessed using independent two-sample t-test, chi-square, and Fisher's exact test.The associations of the PRS and relevant characteristics with the risk of breast cancer in the study cohort were investigated through

Figure 1 .
Figure 1.Distribution of PRS in the study cohort.(a) In the overall cohort, (b) among women with no breast symptoms, and (c) among women with mastalgia or palpable lesion.

Figure 2 .
Figure 2. Study flowchart.BCS breast cancer screening, TC-VGH EHR Taichung Veterans General Hospital electronic health records, TPMI Taiwan Precision Medicine Initiative, CRD cancer registry database.

Table 1 .
Clinical characteristics of study cohort (n = 6335).Significant p-values are in bold.

Table 2 .
Association of the PRS quartiles in breast cancer risk.Significant p-values are in bold.OR odds ratio, CI confidence interval.a Unadjusted results were estimated using univariate analysis.b Results were adjusted for relevant factors, including previous pregnancy, menopause status, oral contraceptive use, and hormone replacement therapy.

Table 3 .
Predictive performance of PRS for breast cancer risk.Significant p-values are in bold.BC breast cancer, OC oral contraceptive use, HRT hormone replacement therapy.a Harrel's C-index for PRS subgroup in univariate model.b Harrel's C-index for multivariate model with all retained variables.